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ABSTRACT 

Spatial Independent Components Analysis (ICA) is increas- 
ingly used in the context of functional Magnetic Resonance 
Imaging (fMRI) to study cognition and brain pathologies. 
Salient features present in some of the extracted Independent 
Components (ICs) can be interpreted as brain networks, but 
the segmentation of the corresponding regions from ICs is 
still ill-controlled. Here we propose a new ICA-based pro- 
cedure for extraction of sparse features from fMRI datasets. 
Specifically, we introduce a new thresholding procedure that 
controls the deviation from isotropy in the ICA mixing model. 
Unlike current heuristics, our procedure guarantees an exact, 
possibly conservative, level of specificity in feature detec- 
tion. We evaluate the sensitivity and specificity of the method 
on synthetic and fMRI data and show that it outperforms 
state-of-the-art approaches. 

Index Terms — ICA, fMRI, ROC, sparse models. 



1. INTRODUCTION 

In neuro-imaging, ICA is the most popular method to ex- 
plore the spatial correlation structure of fMRI signals. Some 
extracted ICs match well-known brain networks (TJ [2l and 
have been shown to correspond to units targeted by neuro- 
degenerate diseases |3|. These sources form spatial maps that 
represent sparse networks of brain activity: only a small per- 
centage of the voxels observed are active in a given network. 
Daubechies et al [4J have argued that this sparsity is key to 
the success of ICA in the context of fMRI. When applied to 
data generated from sparse sources, ICA amounts to sparse 
coding |5|. It has enjoyed more success in the neuro-imaging 
community, probably because it groups together correlated 
features into components interpreted as brain networks. Cur- 
rent state-of-the-art ICA models for fMRI (MELODIC |2|) 
apply univariate mixture models to ICs to separate signal from 
noise and recover the sparse structure. 

In this paper, we present a multivariate model of sparse 
brain activity and an associated procedure for recovering the 
sparse features with a statistical control of false detections in 
the presence of noise. We will focus on single- subject analy- 



sis but the method could easily be extended to group analysis 
with the addition of a group model. 

2. SIGNAL MODELING AND ESTIMATION 

ICA is an unsupervised learning algorithm. As such, it does 
not provide a framework for statistical- significance testing, 
but can be used to analyze fMRI data without external corre- 
lates, such as in resting state. We introduce a model of the 
fMRI signal based on the assumption of very sparse sources. 

Generative model. In the observations from the scanner 
yQ, the underlying BOLD dynamics is confounded by obser- 
vation noise F. As with most fMRI ICA analysis procedures, 
we assume that the signal of interest spans only a sub-space 
of the observation space. Components C spanning this sub- 
space can be estimated using probabilistic principal compo- 
nent analysis (PC A) O, which assumes F to be Gaussian- 
distributed, and lying in a subspace orthogonal to C: 



Y = WC + F, 



(1) 



where Y and F are (ntime steps, ^voxels) matrices, the rows of 
which form pattern vectors. C is the (ncomponents, ^voxels) pat- 
tern matrix of the retained principal components and W is the 
matrix of their loadings in the observed signal. In this paper, 
we do not discuss estimation of the sub-space of interest, but 
focus on recovering sparse brain- activity sources from C. 

We model the patterns C as generated by a set of sources 
A, confounded by additive noise E, and observed as a linear 
mixture in the sub-space spanned by C: 



C = MB 



(2) 



B = A + E 



(3) 



M is an orthogonal mixing matrix. A, B, and E are 
(^components, ^voxels) matrices. Unlike F, E is in the same 
sub-space as the brain sources. In addition, we assume that 
the true sources correspond to the marginals A^ that are 
sparse: most of the coefficients of A^ are zeros. As a result, 
the histogram of A^ is strongly super-Gaussian: it has heavy 
tails. If the amplitude of the noise E is small compared to the 
non-zero coefficients of A, B is also super-Gaussian and can 
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^ Y corresponds to the data from the scanner after slice-timing interpola- 
tion and motion correction. In addition, when doing group analysis, a normal- 
ization procedure is often applied, followed by Gaussian spatial smoothing. 



be estimated from C using ICA. We use FastICA, a proce- 
dure that selects a basis of the signal sub- space maximizing 
non-Gaussianity of the corresponding marginal distributions 

If the components B are observed mixed, the observed 
projections reflect mostly the isotropic noise E and not 
the sources of interest A that are sparse only in a particular 
basis. This is why the estimation of the mixing model (2) 
is important for fMRI data analysis, as the marginals on the 
estimated basis separate A from the background noise E. 

Thresholding ICs to control for noise. We assume that 
the values of the non-zero voxels of sources A are larger than 
the standard deviation a of E. According to our model, select- 
ing voxels specific of the support of A amounts to choosing a 
threshold to apply on the ICs B. ICA estimates particular 
directions of the feature space, thus a possible null hypothesis 
Hq for ICA is that all directions are equivalent. As a result, 
the null distribution for the marginals A^ is given by projec- 
tions on random directions uj of the feature space. 

p{Ai > ra\Ho) = mean p{\uj^B\ > t^) (4) 

a;, ||a;|| = l 

We can sample this distribution directly from the data. In 
addition, cj^B is a linear combination of the random variables 
B^. As the sub-space has been whitened by the PCA, they all 
have a variance of 1. For high dimensions, the central limit 
theorem thus states that the distribution of u^'B is Gaussian 
of variance 1 . In this case, the p-value is given by the inverse 
of the cumulative distribution function of a Gaussian process, 
and the threshold can be set as with a normal null. 

A representation of the signal in feature space is given 
on Fig. [T] for various distributions: synthetic data generated 
from the model exposed above (Fig. [1^), synthetic data with 
additional super-Gaussian noise, (Fig. [TJ)), and fMRI data 
(Fig.[T]:). All share a central mode corresponding to E in our 
description, that can be approximated as a multivariate Gaus- 
sian process. In addition, for each mixing direction, activated 
voxels can be found when moving away from the center. 

Our model is different from most noisy ICA models, as 
they assume that contribution of the noise to the signal sub- 
space is small. They account for the noise in the ICA estima- 
tion by correcting the bias it introduces to the whitening and 
the measures of statistical independence Q. In our model, 
noise accounts for a large fraction of the variance in the sig- 
nal sub- space. 

3. SIMULATION STUDY 

We generate synthetic samples Y from our model with a 
known ground truth and noise model. We consider 9 features 
A, that is 2D maps (80, 80) pixel large and made of one 
or two rectangles of uniformly-active pixels on a null back- 
ground. We add random noise E generated by a multivariate 
normal distribution of isotropic variance 1. We control the 




Fig. 1. Scatter plot of samples projected in the subspace 
spanned by the two first ICs identified. The density is rep- 
resented by a colormap ranging from black (low density) to 
white (high density). The threshold as set by the model with 
p = 10 ~^ is represented as a light blue circle, (a) Simulated 
data with 9 features total, with E generated from a Gaussian 
process with a = 0.15. (b) Same simulations with super- 
Gaussian noise (kurtosis of 4). (c) fMRI data. 

amplitude of the noise with a parameter A: B = A + AE. 
We draw a random rotation matrix M to mix the patterns B, 
and apply a Gaussian spatial smoothing of FWHM 2 pixels 
to simulation the point spread function of the scanner. Due 
to the smoothing, the noise term is observed as a random 
Gaussian field with a reduced variance compared to the initial 
random process. We set A to control the variance of this field. 

In addition, as it is likely that, in real fMRI settings, not all 
background noise can be described by Gaussian processes, we 
generate synthetic data with non-Gaussian noise. For this, in 
addition to the previous Gaussian random field. Eg, we gen- 
erate a super-Gaussian contribution Eng by applying a non- 
linear rescaling to a smoothed Gaussian random field. We use 
the cubic non-linearity that generates spiky noise with a long- 
tailed distribution. The additional noise term is thus sparse, 
and not invariant by rotation of the feature space. We add 
it to the signal of interest in the observation basis. We set 
the contributions of both noise terms to control the variance 
and kurtosis of the resulting random process: C = M A + 
A (cos 6> M Eg + sin 6> Eng). This structured noise term vio- 
lates the noise model of the ICA algorithm and poses thus a 
challenge to the feature extraction by offsetting the estimation 
of the mixing matrix, and thus the projection. 

Spatial maps generated by the simulations are presented 
on Fig. [21 The samples projected in feature space on the 
2 first ICs are presented on Fig. \T\ We apply ICA estima- 
tion and thresholding as described above. To quantify the 
specificity and the sensitivity in feature detection, we plot 
receiver-operator characteristics on Fig. |4] for Gaussian and 
super-Gaussian (kurtosis = 4) noise. Increasing noise ampli- 
tude a degrades estimation performance, as the central mode 
becomes indistinguishable from the outliers we are interested 
in. Performances are slightly degraded by the addition of 
the super-Gaussian noise. It induces errors in the choice of 
the projection basis, as can be seen on Fig. [H): in the pro- 
jected space, sources are not completely unmixed. In addi- 
tion, on Tab. [H we compare false positive rates to the spec- 
ified p-value. We find that for Gaussian noise amplitudes up 
to cr = 0.20 or super-Gaussian noise amplitude of a = 0.15, 
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Fig. 2. Simulated data, showing 5 samples out of 9, for E 
generated from a super-Gaussian process with a = 0.15 and 
a kurtosis of 4. The threshold is set by the model with p = 
10~^. Top row: observed samples Y. Middle row: ICs 
B. Bottom row: estimated sources A, the ground truth is 
outlined in light yellow. 
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Table 1. False positive rates as a function of model-based 
p-value, for simulated and fMRI data. 

the p- values give an exact control on type 1 errors. With more 
noise, the tail of the central mode cannot account for all false 
detections for small p- values. We stipulate that the additional 
errors come from projection error due to incomplete source 
unmixing by the ICA procedure. 

4. FMRI STUDY 

We apply our method to fMRI data for 12 subjects at rest 
from a previous study |8|. 820 volumes were acquired with 
a repetition time (TR) of 1.5 s. We run the procedure (ICA 
analysis and thresholding) for single- subject data on the first 
40 principal components. For fMRI data, the ground truth is 
not known, so we generate degraded datasets from the origi- 
nal dataset, and consider the latter as a pseudo ground truth to 
quantify error rates. This procedure quantifies consistency of 
the estimator in the presence of noise. To generate degraded 
datasets while retaining observations of the same brain activ- 
ity, we use one volume out of 3. The effective TR of the down- 
sampled datasets is 4.5 s. This sampling rate is enough to 
retain most of the hemodynamic response, convolved by the 
6-second-long response function. In addition, the 3 resulting 
interleaved time series sample different high-frequency noise 
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Fig. 4. ROC plot: sensitivity as a function of false posi- 
tive rate for synthetic data using Gaussian and super-Gaussian 
(kurtosis = 4) noise of varying a, as well as for fMRI data. 

that confounds the signal of interest. Thresholded ICs esti- 
mated on the various resampled datasets for one subject are 
matched with the corresponding pseudo ground truth. Fig. [3] 
presents pseudo ground truth and downsampled data. On non- 
thresholded ICs, we can see that the level of background noise 
is indeed higher in ICs learned on downsampled data. We run 
the MELODIC mixing model on the ICs to compare sensitiv- 
ity (false negatives) and specificity (false positives). 

As seen on the ROC plot (Fig.|4l), average performance on 
fMRI data for the 12 subjects is on par with simulated data. 
Good control of false positives can be achieved, but the true 
positive rate remains limited. This can be explained by er- 
rors in our pseudo-ground truth. In addition, the false positive 
rate is controlled by the specified p-value only to 10~^, al- 
though to account for errors in the pseudo-ground truth, the 
observed false positive rate should be corrected by a factor 
0.5. With MELODIC 's mixture model, we specify different 
inter-class mixing probability ratios to vary specificity; we do 
not report on very large or very small ratios as they induce 
non-monotonous thresholding and poor overall performance. 
Our multivariate thresholding proceeding can achieve better 
specificity/sensitivity trade off MELODIC's mixture model. 

ICs estimated on fMRI data most often display a few 
salient features related to anatomical regions and may be 
interpreted as brain networks. On such IC, both our thresh- 
olding procedure and MELODIC's mixture model extract 
similar regions, although our procedure yields fewer small 
clusters outside of the main segmented areas (see Fig. [51 top). 
In contrast, some ICs, representative of non-cognitive pro- 
cesses such as blood flow or movement, are very fragmented 
and diffuse with no region strongly standing out. On these 
ICs, a mixture model fits the null distribution to the center 
of the histogram, and thus selects large regions, whereas our 
thresholding procedure selects very few voxels, as it does not 
consider the component by itself, but as part of the complete 
multivariate signal (see Fig.O bottom). 
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Fig. 3. ICs estimated from fMRI data and thresholded using MELODIC's mixture model, and our multivariate thresholding 
procedure. Top rows: IC detecting the primary visual areas. Bottom rows: IC representative of a vascular artifact. 



5. CONCLUSION 

This contribution presents a procedure for thresholding ICA 
patterns of fMRI time series to recover sparse sources using a 
multivariate model of spatially- sparse brain activity that does 
not rely on correlating with external stimuli. From a practical 
point of view, the main improvement over existing ICA-based 
methods for fMRI is that non-neuronal patterns are rejected as 
they do not correspond to very salient features. We have val- 
idated on simulated data and resting- state fMRI data that the 
procedure can yield exact control of the false positive rates 
for p > 10~^ and achieves better sensitivity/specificity trade- 
offs than the current state-of-art fMRI ICA support- selection 
procedures. Control of false detections and consistency of es- 
timation on noisy data is important for clinical and medical re- 
search applications of resting- state fMRI. Our procedure can 
be understood as outlier detection with projection pursuit, as 
proposed by Gnanadesikan and Kettenring [9], using ICA. 
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